Feature/4.0/helm installer#1064
Open
jorgemoralespou wants to merge 152 commits into
Open
Conversation
Adds the v4 architecture-of-record under docs/architecture/:
- educates-current-state.md — what v3 actually is today.
- educates-v4-development-plan.md — phased plan + open items + the
pre-phase chart workstream this commit-set lands.
- educates-crd-draft-v1alpha1-r3.md — operator CRD design (informs
Phase 0+).
- decisions.md — append-only decisions log; entries grouped by
topic with reconsider triggers where relevant.
CLAUDE.md is the briefing for any future Claude Code session in this
repo: scope of v4 vs v3, what's safe to touch, working norms,
references back to the architecture docs.
.gitignore picks up .claude/ so user-local agent state doesn't leak.
Pre-phase deliverable from docs/architecture/educates-v4-development-plan.md.
The chart is the canonical Helm install for the Educates v4 runtime —
what users helm install today (and what the v4 operator will install
on their behalf in Phase 4).
Layout:
installer/charts/educates-training-platform/
├── Chart.yaml, values.yaml, .helmignore (umbrella)
└── charts/
├── secrets-manager/ (CRDs + operator: cross-NS secret
│ propagation primitives)
├── lookup-service/ (federation API; off by default)
├── remote-access/ (read-only RBAC + token Secret for
│ external CLI clients; toggleable
│ independently)
└── session-manager/ (workshop runtime + bundled v3
Kyverno policies)
CRDs ship in each subchart's `crds/` directory rather than templates/
because Helm can't apply CRDs and CRs of those CRDs in a single
release otherwise (see decisions.md).
session-manager ships v3-vendored Kyverno policies on two paths:
- bundledKyvernoPolicies.clusterPolicies — Pod Security Standards
profiles installed cluster-wide as ClusterPolicy resources.
- bundledKyvernoPolicies.workshopPolicies — operational policies +
Educates-internal `require-ingress-session-name`, written into
the educates-config Secret for session-manager to clone per
workshop environment.
additionalKyvernoPolicies.{clusterPolicies,workshopPolicies} let
admins extend either bundle through chart values — net-new vs v3,
which required out-of-band kubectl-apply.
Image refs default to ghcr.io/educates/educates-* (v3 naming
convention). Tag default is the chart's appVersion; scenarios pin to
3.7.1 because no v4 runtime images exist yet.
…lates Stray file from troubleshooting .Files.Glob behaviour in kyverno-cluster-policies.yaml during chart development. Helm treats templates/_*.txt as a partial-style include and skips it during render, so it never affected output — but it shouldn't be in the chart.
Six end-to-end scenarios under installer/charts/educates-training-platform/tests/.
Each scenario provisions a kind cluster (cluster-only), runs an
optional pre-install hook to stage cluster-side fixtures, applies the
chart, runs an optional post-install hook before rollout-status, and
exercises the deploy-workshop / browse path.
run-scenario.sh wires four hook points (pre-install, post-install,
post-deploy, teardown) so each scenario carries its own setup +
assertions. .env loading + envsubst rendering of scenario files lets
the runner pick up DOMAIN, TLS_CERT_PATH, CA_CERT_PATH from the user
shell (handy with mkcert auto-detection) without touching scenario
files.
Scenarios:
01-local-http-nip-io — minimal HTTP smoke; everything
optional off.
02-kind-tls-wildcard — offline-generated wildcard +
secretPropagation.upstream paths.
03-kind-cert-manager-issuer — cert-manager + certs package
issues the wildcard from a
user-provided CA.
04-website-theme — custom websiteTheme reaches the
live portal HTML (post-deploy
curls portal URL, greps marker).
05-image-pull-secrets — auth'd local registry serves a
copy of educates-session-manager;
scenario fails if the chart's
pull-secret chain breaks.
06-additional-kyverno-policies — bundled + user-supplied
ClusterPolicies on both paths
(cluster-wide + per-environment).
Step 5 prints a click-through portal URL with URL-encoded password
(uses a pure-bash percent-encoder).
Add the docs-of-record for the session-manager subchart's typed values shape (values.yaml + JSON schema), and update the v4 development plan to mark Kyverno-policy bundling as done and rescope the typed-runtime- config follow-up against the broader target shape.
Refactor the session-manager subchart's values surface from an opaque
config blob plus ad-hoc toggles into the typed shape defined by the
docs-of-record (docs/architecture/session-manager-chart-values{.yaml,
-schema.json}). Adds values.schema.json so Helm catches shape errors
at template time.
Key changes:
- clusterIngress / clusterSecurity / workshopSecurity replace the
config: clusterIngress + bundledKyvernoPolicies + openshift.enabled
knobs. policyEngine and rulesEngine are PascalCase enums in chart
values, lowercased when emitted into the runtime config blob to
match the runtime's existing expectations (no runtime change).
- Promotes imageRegistry, imageVersions, sessionCookies, clusterStorage,
clusterRuntime, clusterNetwork, dockerDaemon, workshopAnalytics, and
websiteStyling.{defaultTheme,frameAncestors} out of config into typed
values; chart auto-injects operator.namespace (release ns) and
version (chart appVersion).
- websiteStyling.inline.{workshopDashboard,workshopInstructions,
workshopStarted,workshopFinished,trainingPortal} replaces the flat
websiteTheme map; the chart maps the structured triples to the
flat secret keys (.html / .js / .css) the runtime expects.
- SecretCopier rules for ingress TLS+CA are auto-derived from
clusterIngress.{tls,ca}CertificateRef.namespace; the explicit
secretPropagation.upstream.ingressTLS / ingressCA knobs are gone.
themeDataRefs entries with non-local namespaces are also auto-copied.
- Materialises empty-string TLS/CA refs in the operator-config blob
even when unset, papering over the runtime's xget-no-default-None
quirk.
- config remains as an opaque escape hatch, deep-merged on top of the
typed-derived blob so users can land new fields before promotion.
Scenarios 01 (local HTTP nip.io) and 02 (kind + TLS wildcard) are
updated to the typed shape and verified via helm template. The
remaining scenarios (03-06) move to the typed shape in a follow-up
commit.
Brings scenarios 03-06 onto the typed values shape introduced in the
previous commit and adds scenario 07 to exercise the `config:` escape-
hatch deep-merge.
- 03 (kind cert-manager issuer): cross-namespace TLS/CA refs now drive
the SecretCopier auto-derive instead of an explicit
`secretPropagation.upstream.ingressTLS/ingressCA` block.
- 04 (website theme): uses `websiteStyling.inline.trainingPortal.{html,
style}`, exercising the chart's structured-triple → flat-secret-key
mapping (`style` → `.css`).
- 05 (image-pull secrets): typed top-level `imagePullSecrets` for the
PodSpec; `secretPropagation.imagePullSecretNames` and
`secretPropagation.upstream.imagePullSecrets` unchanged.
- 06 (additional Kyverno policies): toggles renamed —
`clusterSecurity.{policyEngine,additionalKyvernoPolicies}` and
`workshopSecurity.{rulesEngine,additionalKyvernoPolicies}` replace
the old `bundledKyvernoPolicies` / `additionalKyvernoPolicies.{cluster,
workshop}Policies` blocks.
- 07 (new): asserts the `config:` opaque map deep-merges on top of the
typed-derived runtime config and wins on conflict (`dockerDaemon.
networkMTU` override) and passes through unknown fields untouched
(`experimental.markerKey`).
Drive-by: schema's top-level `imagePullSecrets` was [string] but the
PodSpec wants the standard k8s [{name: ...}] shape — fixed in both the
chart's values.schema.json and the docs-of-record.
Records the decision behind the typed-values refactor (commits a57ef864 / d339c016 / f61d3c8e): a single typed surface serves both the operator and standalone chart users, with `config:` retained as an escape hatch for not-yet-promoted runtime fields. The earlier "operator-driven, not v3-driven" decision is superseded — its framing left standalone users writing opaque YAML against the v3 schema from memory, which contradicted the chart's own publish-as-canonical-Helm install positioning.
…default imageVersions
Fixes ErrImagePull on training-portal (and any other runtime-spawned
child image) by stopping the chart from auto-injecting Chart.AppVersion
(`4.0.0-alpha.1`) as the runtime config's `version` field. Restores the
defaulted `imageVersions` list that v3's carvel installer rendered out
of `config/images.yaml`, which the previous refactor silently dropped.
- New typed value `runtimeVersion: "3.7.1"` drives:
* the chart-pod `image.tag` default
* the `imagePuller.pauseImage.tag` default
* the `version` field auto-injected into the operator-config blob
- `imageVersions` defaults to the full v3 list: 12 Educates-published
images pinned to `runtimeVersion`, 7 upstream pins (docker-in-docker,
loftsh-kubernetes-v1.31..34, loftsh-vcluster, debian-base-image) at
their v3-vendored tags. Visible in values.yaml so chart users see
exactly which images the runtime can pull.
- Drops the redundant `image.tag: "3.7.1"` overrides in scenarios 01-07
(chart default now resolves correctly). Scenario 05 keeps its
`image.repository` override for the auth'd local registry but no
longer pins the tag.
- Mirrored in docs-of-record (values.yaml + JSON schema), the v4 dev
plan, and a new decisions.md entry covering the rationale and the
documented two-place-edit when bumping `runtimeVersion`.
…helper-defaulted imageVersions with per-key merge Supersedes the runtimeVersion + populated-values approach from e71e41af with a more idiomatic shape: - `Chart.appVersion` IS the runtime image version. Set to "3.7.1" across the umbrella + four subchart Chart.yamls (was "4.0.0-alpha.1" — which doesn't exist as a published image and was the original source of the ErrImagePull). `Chart.version` stays at 4.0.0-alpha.1 (the chart-package version). They normally move together at release time; the field separation exists for chart-only patches. - Removed `runtimeVersion` typed value. Image-tag defaults (`session-manager.image.tag`, `imagePuller.pauseImage.tag`) and the operator-config blob's `version` field all source `Chart.appVersion` directly. - The full default `imageVersions` set moves into a new template helper `session-manager.imageVersions` (mirroring v3's carvel-installer images.yaml). Educates-published entries derive their tag from `Chart.appVersion`; upstream pins (docker-in-docker, loftsh-*, debian-base-image) are hard-coded. - User-supplied `.Values.imageVersions` entries merge BY NAME on top of the helper defaults: an override replaces just the matching default's image, other defaults pass through, names not in the default list are appended (forward-compat). Strictly better UX than v3's full-list replacement; chart users override only what they need. - `values.yaml`, `values.schema.json`, and the docs-of-record return to `imageVersions: []` with comments documenting the per-key merge semantics. The helper is the documented inventory. - Dev plan updated. decisions.md entry replaced with the corrected rationale (Chart.appVersion sourcing, helper-defaulted list, per-key merge UX).
…-pod, pause, and runtime children
A chart user pointing at a fork or a local registry now redirects
every Educates-image reference with one knob. Previously only the
runtime-spawned children were derivable from imageRegistry — the
chart pod and the pause image were hard-coded to ghcr.io/educates/...,
which broke the dev workflow against a fork.
- `imageRegistry.host` / `.namespace` default to `ghcr.io` / `educates`
(was empty/empty). They now compose the prefix used by:
* `session-manager.imageRegistryPrefix` helper (new)
* the chart-pod `image.repository` default (when empty)
* the pause image `imagePuller.pauseImage.repository` default
(when empty)
* the Educates-published entries in the `session-manager.imageVersions`
helper
- Upstream pins (docker-in-docker, loftsh-*, debian-base-image) are NOT
relocated by imageRegistry — they're public upstream images that don't
follow the educates-<name> naming convention. Mirror them via per-entry
imageVersions overrides instead.
- `image.repository` and `imagePuller.pauseImage.repository` defaults
flip from hard-coded refs to empty strings; helpers `session-manager.
image.repository` and `session-manager.pause.image.repository` resolve
the empty-derives-from-imageRegistry behaviour. Schema's `imageRef`
drops the minLength on `repository` to allow the empty.
- Helper bails fast (`fail`) if imageRegistry.host is empty — the
chart can't compose a default ref without it.
- Verified end-to-end: setting `imageRegistry: {host: localhost:5001,
namespace: educates-fork}` redirects 12 runtime children + chart pod
+ pause to that prefix, while upstream pins stay put.
Mirrored in docs-of-record (values.yaml + JSON schema), v4 dev plan,
and the prior decisions.md entry.
…curity to umbrella globals
Cross-cutting deployment-scope values now live at the umbrella under
`global:`. Helm propagates them to every subchart as `.Values.global.<key>`.
Subcharts retain a local block of the same name with sensible defaults;
new helpers deep-merge the umbrella global over the subchart local, with
globals winning per-leaf where set. Subcharts remain independently
installable.
Concretely:
- session-manager: new `resolved{ImageRegistry,ClusterIngress,
ClusterSecurity}` helpers feed `imageRegistryPrefix`, `derivedProtocol`,
`operatorConfigYAML`, `kyverno-cluster-policies.yaml`,
`clusterrolebindings.yaml`, and `secretcopiers.yaml`. Schema drops
`clusterIngress` and `clusterSecurity` from `required` and removes the
subchart-local `clusterIngress.domain` minLength — helpers do the
post-merge `fail` instead.
- lookup-service: new `imageRegistry` block (default ghcr.io/educates),
`image.repository` flips to empty (derives from imageRegistry),
helpers added (`resolvedImageRegistry`, `imageRegistryPrefix`,
`image.repository`). Wired into the Deployment. New
`values.schema.json`.
- secrets-manager: same image-helper additions. `openshift.enabled`
removed; the SCC ClusterRoleBinding now gates on
`clusterSecurity.policyEngine == "OpenShiftSCC"` (resolved from
globals when present). New `values.schema.json`.
- Umbrella `values.yaml` gains a `global:` block with commented examples
for the three keys.
- All seven scenarios converted to canonical-globals shape: cross-
cutting values live under `global:`, subchart blocks shrink to per-
subchart concerns only. Verified end-to-end — setting
`global.imageRegistry.{host,namespace}` redirects all chart pods +
runtime children; `global.clusterSecurity.policyEngine: OpenShiftSCC`
triggers SCC bindings in BOTH session-manager and secrets-manager.
Mirrored in the doc-of-record (note about dual-source pattern), the v4
dev plan (each cross-cutting block now flagged as a global), and a new
decisions.md entry covering the rationale and trade-offs.
Drive-by: `tests` added to .helmignore so scenario fixtures don't ship
with the chart package.
… render ca-trust-store init container
Drops lookup-service's specialised `caTrust` block and `ingress.tls`
field in favour of consuming `global.clusterIngress` (with subchart-
local fall-back). The lookup-service Ingress's TLS Secret now derives
from the resolved `clusterIngress.tlsCertificateRef.name` (typically
the wildcard cert covering `*.<domain>`), and the chart renders a
ca-trust-store init container when `clusterIngress.caCertificateRef.name`
is set.
- New `clusterIngress` block in lookup-service values.yaml mirrors the
shape introduced in session-manager + the umbrella global.
- `caTrust.{secretName,initImage}` removed; the init image is no longer
base-environment but the lookup-service main image itself (Fedora-
based: has `update-ca-trust` and `tar`). Zero extra image pulls; the
kubelet already has it on the node. Mirrors v3's
overlay-ca-injector.yaml mechanism without the cost of pulling a
multi-GB workshop image.
- `ingress.tls.secretName` removed; the Ingress derives TLS from the
resolved `clusterIngress.tlsCertificateRef`.
- New `secretcopiers.yaml` auto-derives copy rules for both the TLS
Secret and the CA Secret when their refs target a foreign namespace.
Renders independently of session-manager's SecretCopier so this chart
is installable standalone; under the umbrella both subcharts render
their own rules (idempotent — same source-Secret copied once
regardless of how many rules reference it).
- Helpers updated: drop `caTrust.image.{tag,pullPolicy}`, add
`resolvedClusterIngress` and `caTrustEnabled`.
- Schema reflects the new shape.
Verified by enabling lookup-service against a TLS+CA scenario:
- Ingress emits `tls: [{secretName: wildcard-tls}]` with the
hostname-specific `host` and the resolved cert name.
- Init container reuses the lookup-service image and runs
`update-ca-trust && tar -C /etc/pki/ca-trust ...`.
- Main container mounts the CA-populated trust store at
/etc/pki/ca-trust read-only.
- SecretCopier `educates-lookup-service-ingress-secrets` pulls both
refs into the release namespace.
…ger Deployment Mirrors what step 2 added to lookup-service: a chart-side ca-trust-store init container that builds a CA-populated trust store from `global.clusterIngress.caCertificateRef` and the main container mounts the result at /etc/pki/ca-trust read-only. Reuses the main session- manager image (Fedora-based: has `update-ca-trust` and `tar`) so no extra image pull on the node. v3 only injected the trust store into lookup-service. Including it in session-manager too is harmless when the CA isn't needed and avoids debugging "why does X fail TLS verify?" later if session-manager ever gains code paths that reach external TLS endpoints fronted by the private CA. - New `session-manager.caTrustEnabled` helper + Deployment template rewire — initContainers / volumes / volumeMount conditionally on the resolved `clusterIngress.caCertificateRef.name`. - The init container's securityContext explicitly sets `runAsNonRoot: false` to override the pod-level `runAsNonRoot: true` enforcement (the trust-store update needs UID 0 to write /etc/pki/ca-trust). Mirrored in lookup-service for consistency. - Verified: scenario 02 (TLS+CA) renders the init container; scenario 01 (no CA) does not; existing SecretCopier auto-derive still pulls the CA Secret into the release namespace where the init container consumes it.
…chart Brings remote-access in line with the schema discipline applied to the other three subcharts (additionalProperties: false; only `enabled` and the Helm-injected `global` are valid). The subchart has no configurable knobs in v0.1.0 — the schema serves purely as a typo-catcher and a contract that future additions are deliberate. All seven scenarios still render; an additional smoke-render with remote-access.enabled=true also passes.
Restores the v3 cluster-node CA injection feature as a sibling subchart under the umbrella, replacing the never-rendered `session-manager.clusterIngress.caNodeInjector.enabled` stub. Toggle is the umbrella's `node-ca-injector.enabled: false` (default off, opt-in) via Helm's standard subchart-condition mechanism. What renders when enabled (mirroring v3's 07-node-ca-injector.yaml): - ServiceAccount + ClusterRole/Binding (Ingress watch) + Role/RoleBinding (ConfigMap manage in release ns). - `node-ca-injector-controller` Deployment running the `controller` subcommand — watches Ingresses, builds the `educates-registry-hosts` ConfigMap. - `node-ca-injector` DaemonSet running the `sync` subcommand — privileged, mounts the CA Secret + hosts ConfigMap + hostPath `/etc/containerd/certs.d`. Writes per-host containerd registry-CA configuration so containerd trusts the cluster's private CA when pulling images. - SecretCopier auto-derived when the CA ref's namespace is foreign. The subchart consumes `global.clusterIngress.caCertificateRef` (with subchart-local fall-back for standalone) and fails fast at template time if the resolved CA ref is empty. Image is derived from `global.imageRegistry` so the same fork/local-registry knob redirects this subchart too. Has its own `values.schema.json`. Relationship to the per-pod ca-trust-store init container (steps 2/3): complementary, not overlapping. Init container handles in-pod TLS verify for our own Deployments; node-ca-injector handles container-runtime- level trust for image pulls (including pulls performed by pods we don't render — third-party operators, docker-in-docker workshop sessions). Both keyed on the same global CA ref; both independently togglable. `session-manager.clusterIngress.caNodeInjector.enabled` is removed from values.yaml + values.schema.json + the doc-of-record. New decisions.md entry covers the rationale + the relationship between the two CA-trust mechanisms.
… top-level shape
Closes the validation gap on the umbrella's cross-cutting `global:`
block. Subchart schemas correctly treat `global` as opaque (they
shouldn't dictate the umbrella's contract), which meant typos like
`global.clusterSecuirty.policyEngine` or `global.imageRegistry.namespece`
silently fell through — every subchart fell back to its local defaults
and the user's intended override was lost.
The umbrella schema:
- Validates the `global.{imageRegistry,clusterIngress,clusterSecurity}`
shape with `additionalProperties: false` at every level.
- Forbids unknown top-level keys (catches misspelled subchart names
like `sesion-manager:` that Helm would otherwise treat as inert).
- Treats each subchart block (`secrets-manager`, `lookup-service`,
`remote-access`, `session-manager`, `node-ca-injector`) as
`{ "type": "object" }` and delegates detailed validation to that
subchart's own schema. No duplication.
Verified: all three classes of typo (misspelled global key, misspelled
global nested field, unknown top-level key) trigger a clear schema
error at `helm template` time. All seven existing scenarios still
render cleanly.
decisions.md entry covers the split rationale.
End-to-end test for node-ca-injector: a workshop session builds a tiny
container image, pushes it to the per-session registry (HTTPS via the
wildcard cert), then creates a Deployment that pulls from that
registry. Successful rollout is the proof — without the cluster CA in
containerd's per-host trust under /etc/containerd/certs.d/, the pull
would fail with TLS verify errors and the rollout would hang.
Runner change:
- run-scenario.sh now detects `<scenario>/workshop/resources/workshop.yaml`
and, if present, publishes the scenario-local workshop to the local
registry stood up by `educates local cluster create` (localhost:5001)
via `educates publish-workshop <scenario>/workshop`, then deploys it
with `educates deploy-workshop -f <that path>`. Scenarios 01-07 keep
the existing default-WORKSHOP_URL behaviour because they don't ship
their own workshop directory.
Scenario 08 (`08-node-ca-injector-image-pull`):
- educates-config.yaml + pre-install.sh: identical to scenario 02
(kind + Contour + Kyverno; pre-install materialises a wildcard TLS
Secret + CA Secret in `educates-secrets`).
- chart-values.yaml: scenario-02 globals shape plus
`node-ca-injector.enabled: true` at the umbrella.
- description.md: explains what the test demonstrates and what success
looks like.
- workshop/: a complete `lab-node-ca-pull` workshop. Enables `docker`
and `registry` session applications. Four content pages walk the user
through writing a Dockerfile, building, tagging and pushing to
`${REGISTRY_HOST}`, creating the Deployment, and watching
`kubectl rollout status` succeed. The summary calls out which step is
the actual proof of node-ca-injector working.
Verified all eight scenarios render cleanly under `helm template`;
scenario 08 emits all eight node-ca-injector resources plus the chart
pods with the ca-trust-store init container.
The runner pause at step 5/6 is the verification surface — interactive
since the proof lives in a workshop session, not in a kubectl assertion
the runner can make against the cluster directly.
…me defaults via Chart.yaml annotations
The user-facing image-registry knob is renamed to
`development.imageRegistry` (subchart-local) and
`global.development.imageRegistry` (umbrella), and the publish-time
default registry moves from a populated `values.yaml` block to
Chart.yaml annotations:
educates.dev/image-registry-host: "ghcr.io"
educates.dev/image-registry-namespace: "educates"
The release workflow updates these annotations per fork (one `yq -i`
call per Chart.yaml) so the chart that gets shipped points at the
right registry without a values override. Mirrors v3's
`push-installer-bundle` Makefile target which baked refs at OCI-bundle
build time, translated to a chart-publish edit step.
The runtime IMAGE_REPOSITORY semantic now matches v3's intent:
- When `development.imageRegistry` is set: emitted into the runtime
config so workshop sessions get IMAGE_REPOSITORY={host}/{namespace}
for `$(image_repository)` content placeholder resolution.
- When empty (normal use): runtime config's `imageRegistry` block is
emitted empty; runtime falls back to
`registry.default.svc.cluster.local` per
`session-manager/handlers/operator_config.py:35`. This avoids
silently breaking the local-dev workflow on installs that left a
populated registry in place.
Implementation notes:
- Each subchart helper now has TWO resolvers:
* `resolvedImageRegistry` falls back to Chart.yaml annotations and
is consumed by chart-rendered + runtime-children image-ref
composition.
* `resolvedDevelopmentImageRegistry` (session-manager only — the
subchart that owns the operator-config emission) does NOT fall
back to annotations; returns user/global only. Emitted into the
runtime config blob's `imageRegistry` field.
- Annotations added to all four image-rendering Chart.yamls (session-
manager, lookup-service, secrets-manager, node-ca-injector). Helper
reads `.Chart.Annotations["educates.dev/image-registry-..."]`.
- Subchart `values.yaml`: `imageRegistry` block dropped; replaced by
empty `development.imageRegistry`. Schemas updated. Doc-of-record
follows the session-manager subchart shape.
- Umbrella `values.yaml` and schema: `global.imageRegistry` →
`global.development.imageRegistry`.
- Helper failure message updated to point at both override paths.
Verified end-to-end:
- Normal mode (scenario 01 with empty development.imageRegistry):
chart pods resolve to `ghcr.io/educates/educates-{secrets,session}-
manager:3.7.1` from annotations; runtime config blob has
`imageRegistry: { host: "", namespace: "" }`.
- Dev override (`global.development.imageRegistry: { host:
localhost:5001, namespace: educates-dev }`): all 12 Educates
imageVersions entries redirect to localhost:5001/educates-dev/
AND the runtime config blob carries the same registry, so workshops
with `$(image_repository)` placeholders resolve consistently.
decisions.md gets a new entry superseding the prior `imageRegistry`
decision with the development-knob framing and the rationale for the
two-resolver split.
Captures GitHub-issue drafts for runtime simplifications that should land once the v4 chart-based install ships in develop. Format mirrors decisions.md — one heading per issue, prose body, date added — so entries can be transcribed to the issue tracker with minimal further editing. Initial entries: - Simplify `operator_config.py` IMAGE_REPOSITORY resolution. Drop the `imageRegistry.host` + `imageRegistry.namespace` compose logic in favour of a single `imageRepository` field, and stop falling through in `image_reference()` for short-names not in `imageVersions` — treat them as config errors instead. - Drop `clusterIngress.tlsCertificate` / `caCertificate` inline forms from `operator_config.py`. The chart only emits the `*Ref` forms. - CI lint: assert Chart.yaml annotations stay in sync across all four image-rendering subcharts (and optionally that version / appVersion / dependency versions match across umbrella + subcharts). - Document the chart release workflow's annotation update step in the release runbook. Each entry has a "trigger to file" so it doesn't get filed prematurely while v3 is still the production install path.
Tighten Phase 0 in the v4 development plan and add three decisions-log entries covering the choices made for kubebuilder bootstrap: - Operator at installer/operator/, kubebuilder's config/ kustomize tree stripped; controller-gen writes CRDs and RBAC directly into the educates-installer Helm chart. - Spec types adopt the full r3 shape from Phase 0; status grows alongside the reconciler that produces each field. Avoids dead API surface drifting from r3. - Operator image at Phase 0 is a local-dev placeholder built via make docker-build; publish-time annotations and release workflow are deferred to Phase 6. Also narrows Phase 0 CEL scope to singleton-name + mode-immutability (mode-field exclusivity moves to Phase 1) and Phase 0 RBAC to the four CRDs only (referenced-resource watches move to Phase 1). CLAUDE.md gets a new "Operator project (Phase 0+)" block listing the make targets and conventions.
Phase 0 step 1: bare kubebuilder scaffold at installer/operator/. No
real types or reconciler logic — just the layout we'll grow.
- Multigroup project (config + platform groups under domain
educates.dev), repo path
github.com/educates/educates-training-platform/installer/operator,
added to root go.work.
- Four APIs scaffolded with controller stubs: EducatesClusterConfig
(config/v1alpha1), SecretsManager / LookupService / SessionManager
(platform/v1alpha1).
- Per the Phase 0 layout decision, kubebuilder's config/ kustomize
tree is stripped; controller-gen writes CRDs and RBAC into
bin/manifests/{crd,rbac} for now and will retarget the
educates-installer chart in step 5.
- Makefile pruned of kustomize-dependent targets (install, uninstall,
deploy, undeploy, build-installer, kustomize tool target,
setup-test-e2e/test-e2e, docker-buildx, docker-push) and the
kubebuilder-default test/e2e/ tree removed. smoke-test target is
staged with a fail-fast message until step 5 wires it.
- Operator-local .github/workflows/ removed; the monorepo CI workflow
for the operator lands in step 6.
Verified: go build ./..., go vet ./..., make generate, and make
manifests all run clean. CRDs + RBAC YAML produced for all four kinds.
Translate the full r3 EducatesClusterConfig spec surface into Go types under api/config/v1alpha1/. Mirrors the CRD draft revision 3 in docs/architecture/educates-crd-draft-v1alpha1-r3.md: - Mode (Managed | Inline), with the full Managed-mode tree: Infrastructure (provider + optional cloud + service-account identities), Ingress (domain, ingressClassName, controller, certificates), Certificates (BundledCertManager / ExternalCertManager / StaticCertificate), ACME with DNS01 solvers (Route53, CloudDNS, Cloudflare, AzureDNS), DNS (BundledExternalDNS / Manual / None), PolicyEnforcement (clusterPolicy, workshopPolicy, kyverno), ImageRegistry (prefix + pullSecrets). - Inline-mode tree mirroring the same surface where applicable (ingress, policyEnforcement, imageRegistry). - Shared OperationalBlock duplicated at every Bundled use site per the r3 design intent (no schema-ref factoring). - Static defaults marked with +kubebuilder:default for fields the r3 doc calls out: dns.provider=None, clusterPolicy.engine=Kyverno, workshopPolicy.engine=Kyverno, kyverno.provider=Bundled. - Enum validation on every closed-set field via +kubebuilder:validation:Enum. Phase 0 CEL rules added (the only two in scope; mode-field exclusivity moves to Phase 1 per the development plan): - Singleton-name on the wrapper type: self.metadata.name == 'cluster' - Mode immutability on the spec: self.mode == oldSelf.mode Status surface is intentionally minimal (observedGeneration, phase, conditions) per the "status grows alongside reconcilers" decision. CRD shape is now Cluster-scoped with shortName ecc and Mode/Phase/Age printer columns. controller-gen output verified: scope: Cluster, both CEL rules present, four defaults populated, three printer columns, ~1.2k lines of well-formed YAML. go build, go vet, make generate, make manifests all pass.
…rom r3 Translate the three platform-group CRDs from r3 into Go types under api/platform/v1alpha1/. Mirrors the CRD draft revision 3: - SecretsManager: image override, logLevel (default info), resources. No replicas knob (singleton at the pod level upstream). Image-pull credentials inherit from EducatesClusterConfig.status.imageRegistry. - LookupService: ingress (prefix + optional tlsSecretRef override), image, logLevel (default info), resources. Component-specific knobs (auth, rate-limiting, storage) deferred until the lookup-service owner specifies them. - SessionManager: ingressOverrides, workshopPolicyOverride, images (overrides only — registry prefix + pullSecrets inherit from EducatesClusterConfig.status), themes (ConfigMap/Secret/URL source type), defaultTheme, tracking (Google Analytics, Amplitude, Clarity, webhook), defaultAccessCredentials, sessionCookieDomain, allowedEmbeddingHosts, storage, network (packetSize, blockedCidrs), imageCache (default disabled), registryMirrors, logLevel. Shared types in common_types.go: LogLevel, ComponentPhase, LocalObjectReference, ImageRef. WorkshopPolicyEngine is duplicated in sessionmanager_types.go to avoid coupling the platform package to the config API group. All three CRDs Cluster-scoped with singleton-name CEL (self.metadata.name == 'cluster') and Phase/Age printer columns. Status surface intentionally minimal (observedGeneration, phase, conditions) per the Phase 0 status policy in decisions.md. go build, go vet, make generate, make manifests all pass. CRDs render clean (lookup ~257, secrets ~234, session ~431 lines).
…ines
Phase 0 step 5: stand up the educates-installer Helm chart and wire the
four trivial reconcilers. The chart is now the canonical artefact for
the v4 installer; controller-gen targets it directly per the Phase 0
layout decision.
Chart at installer/charts/educates-installer/:
- Chart.yaml apiVersion v2, kubeVersion >=1.31.0-0, version and
appVersion locked at 4.0.0-alpha.1 (matches the runtime chart's
versioning approach but tracks operator releases independently).
- crds/: the four CRDs from controller-gen, in Helm's reserved
location — installed once on first helm install, not templated, not
deleted on uninstall (mirrors the runtime chart's CRD-shipping
decision).
- templates/rbac/role.yaml: ClusterRole "educates-installer-manager"
generated by controller-gen. Phase 0 RBAC scope is exactly the four
CRDs and their /status + /finalizers — no Secrets/ClusterIssuers/
IngressClasses watches yet (those land in Phase 1 with the Inline
validator).
- templates/rbac/role-binding.yaml, serviceaccount.yaml,
deployment.yaml: hand-written, Helm-templated. Deployment runs the
manager binary with --health-probe-bind-address=:8081, metrics off
by default, leader election off (single replica).
- values.yaml: image as repository + tag (dev placeholder),
imagePullSecrets, resources, nodeSelector, tolerations, affinity,
leaderElection.enabled. Comment block in values.yaml documents the
Phase 0 local-dev workflow (make docker-build + kind load + helm
install).
- NOTES.txt: post-install message naming the four CRDs, calling out
the Phase 0 stub-only state, and listing useful kubectl commands.
Operator changes:
- All four Reconcile() bodies emit a single "Reconciling X" log line
with the request name and return — gives the smoke test something
to grep for. The kubebuilder TODO scaffolding is removed and replaced
with a Phase-pointing doc comment.
- Makefile manifests target now writes to ../charts/educates-installer/
{crds,templates/rbac}/ instead of bin/manifests/. Role name set to
"educates-installer-manager" to match the chart's hand-written
ClusterRoleBinding.
Verified: go build/vet/generate clean, helm lint passes (only the
benign "icon is recommended" info), helm template renders all four
expected resources (ServiceAccount, ClusterRole, ClusterRoleBinding,
Deployment).
Phase 0 step 6 / final: replace the kubebuilder-scaffolded reconciler tests with Phase 0 CEL validation specs, wire envtest to load CRDs from the chart, add a local kind-based smoke test, and add a repo-root CI workflow. CRD validation tests (envtest, ginkgo): - EducatesClusterConfig (config group): three specs — valid Managed-mode CR named "cluster" is accepted; CR with name != "cluster" is rejected by the singleton CEL; spec.mode change on update is rejected by the mode-immutability CEL. - Platform group: one shared file, one Describe per CRD. Each verifies singleton-name acceptance and rejection. - Both suite_test.go files now point at installer/charts/educates-installer/crds/ instead of the no-longer- present kubebuilder default config/crd/bases. - The kubebuilder-scaffolded reconciler tests (one per kind, with TODO placeholders and incompatible "test-resource" naming) are removed. Smoke test (hack/smoke-test.sh, local-only): - Creates kind cluster on demand, builds the operator image with make docker-build, kind-loads it, helm-installs the educates-installer chart, applies a minimal EducatesClusterConfig, and asserts the "Reconciling EducatesClusterConfig" log line appears within 60s. Tears down on exit unless KEEP_CLUSTER=true. - Wired into the previously-stubbed make smoke-test target. CI (.github/workflows/installer-operator-ci.yaml): - Triggers on changes to installer/operator/, the chart, go.work/go.work.sum, or the workflow itself. - Steps: go vet, go build, manifests-drift check, generate-drift check, make test (envtest), make lint. - Uses go-version-file pointing at the operator's go.mod so CI tracks whatever the project declares. Go-version pin lowered: - go.work and operator go.mod were both bumped to 1.25.7 by kubebuilder init. That triggered a "compile: version go1.25.7 does not match go tool version go1.25.6" warning chain under bash -e, which cascaded through the test recipe even though tests themselves passed. Lowered both to 1.25.0 — works under any 1.25.x toolchain and keeps the workspace consistent with the existing client-programs and node-ca- injector modules. Operator README: - Replaces the kubebuilder TODO-stub README with a tight summary of layout, the architecture docs, and the make targets.
Phase 1 step 1: extend the EducatesClusterConfig API surface with the two structural rules deferred from Phase 0 and the inter-CR contract fields component reconcilers will read. CEL exclusivity (two new rules on EducatesClusterConfigSpec): - When mode is Inline, the Managed-mode top-level fields (infrastructure, ingress, dns, policyEnforcement, imageRegistry) are forbidden. - When mode is Managed, spec.inline is forbidden. Combined with the existing mode-immutability rule, the spec now carries three CEL invariants. All three are envtest-verified. Status surface (the inter-CR contract): - status.mode echoes spec.mode at the time of last successful reconcile so components can branch without reading spec. - status.ingress (StatusIngress): domain, ingressClassName, wildcardCertificateSecretRef (NamespacedSecretRef — namespace + name), optional caCertificateSecretRef, optional clusterIssuerRef. - status.policyEnforcement (StatusPolicyEnforcement): clusterPolicyEngine, workshopPolicyEngine. - status.imageRegistry (reuses spec ImageRegistry shape): prefix and pullSecrets, populated even when empty so components see a single source of truth. Status fields are populated by the Phase 1 reconciler in the next step. The Managed-mode-only fields (bundledChartVersions) and conditions (InfrastructureConfigured, IngressReady, CertificatesReady, DNSReady, PolicyEnforcementReady) remain deferred to Phase 2/3 alongside their producing reconcilers. go build, go vet, make generate/manifests, and make test all pass; the generated CRD reflects all three CEL rules.
Phase 1 step 2: thread the operator's own namespace from the chart
through to the reconciler, and restrict the Secret cache to that
namespace.
- Chart Deployment template: inject OPERATOR_NAMESPACE via the
downward API (fieldRef metadata.namespace).
- cmd/main.go: read OPERATOR_NAMESPACE at startup; fail fast if unset
so a misconfigured Deployment doesn't silently misbehave.
- Manager cache.Options.ByObject restricts &corev1.Secret{} reads to
the operator namespace only — user-supplied Secrets referenced from
spec.inline live there, and the operator has no need to cache
Secrets cluster-wide. ClusterIssuers (cluster-scoped) and
IngressClasses (cluster-scoped) keep cluster-wide cache, as they
must.
- EducatesClusterConfigReconciler gains an OperatorNamespace string
field; main.go threads the value in. The Phase 0 stub doesn't read
it yet, but the wiring is now in place for the Inline validator
(steps 4-5).
go build, go vet, make test all pass; helm lint clean; helm template
shows the new env var injection in the rendered Deployment.
Phase 1 step 3: extend the operator's ClusterRole with read-only access to the resources Inline-mode validation needs to look up. - Secrets (core): for the wildcard TLS Secret, optional CA Secret, and imageRegistry pullSecrets. Reads are cache-restricted to the operator namespace by the Phase 1 step 2 cache.Options. - ClusterIssuers (cert-manager.io): when spec.inline.ingress. clusterIssuerRef is set, the validator checks existence and the Ready condition. - IngressClasses (networking.k8s.io): the validator checks the referenced IngressClass exists. All three are get/list/watch only — Inline mode never modifies cluster state. Markers added on the EducatesClusterConfigReconciler so controller-gen produces them; role.yaml regenerated into the chart.
Phase 1 step 4: implement the EducatesClusterConfig Inline-mode
reconciler. The operator now drains validates referenced cluster state
and publishes the inter-CR status contract that Phase 4 components
will consume.
Validator (validator.go):
- checkIngressClass: cluster-scoped Get against networkingv1.IngressClass.
- checkWildcardSecret: Get in operator namespace; assert tls.crt and
tls.key keys present.
- checkCASecret: optional; Get in operator namespace; assert ca.crt key
present.
- checkClusterIssuer: optional; unstructured.Unstructured against
cert-manager.io/v1 ClusterIssuer; assert status.conditions[Ready]==True.
IsNoMatchError (cert-manager CRD absent) is surfaced as a validation
error rather than a reconcile retry — matches how a user would
experience it. Vendored cert-manager Go types are deferred to Phase 2
per the recorded decision.
- validationError type carries spec.<field> path + reason so condition
messages name the offending input.
Reconcile flow (educatesclusterconfig_controller.go):
- Add finalizer "educatesclusterconfig.config.educates.dev/finalizer"
on first sight; first pass returns Requeue so the next pass sees a
stable resource version. Phase 1 deletion handler is a no-op; Phase
2 Managed-mode will uninstall charts here in reverse install order.
- For Inline mode: run validator → on success, populate
status.{mode,ingress,policyEnforcement,imageRegistry} and set
Ready=True / ValidationSucceeded=True. On failure: set Phase Degraded
and both conditions to False with the field-specific message.
status.imageRegistry is always populated (empty struct when unset)
so components see a single source of truth.
- For Managed mode: no-op stub until Phase 2.
- Defensive guard: if mode==Inline and spec.inline is nil (CEL bypass),
Degraded with "spec.inline required".
Envtest specs (validator_test.go):
- All-refs-valid → Ready, finalizer set, full status contract populated.
- Wildcard Secret missing → Degraded, message names the field +
"not found".
- Wildcard Secret present without tls.crt → Degraded.
- IngressClass missing → Degraded.
- Optional CA Secret referenced but missing → Degraded.
- Delete clears the finalizer and the apiserver removes the object.
Test helpers: makeWildcardSecret uses Opaque type because
kubernetes.io/tls Secrets are apiserver-validated to require both
tls.crt + tls.key, which would block the missing-key test.
go build / vet / test all pass; config-package coverage 63.2% (the
gap is mostly the unreachable Managed-mode branch in Reconcile, which
becomes covered once Phase 2 implements it).
configuration-settings documents externalTLSTermination on the three kinds and the ingressOverrides.protocol field (also corrects the SessionManager not-yet-supported list: Secret-sourced themes and imagePrePuller are wired; defaultAccessCredentials, registryMirrors and non-Secret theme sources are rejected). secure-http-connections replaces the standalone-chart workaround with the supported override, noting certificate settings are still required. Migration guide maps clusterIngress.protocol accordingly. decisions.md records the placement rationale; the External-load-balancer follow-up is marked partially landed with the remaining scope.
Verifying the operational.replicas plumbing against the vendored upstream charts showed the shared shape didn't hold: external-dns 1.21.1 hardcodes replicas to 1 and silently swallows the bogus replicaCount value, Kyverno fanning one count across its four controllers conflicts with upstream HA guidance (3+ for the admission controller only), and cert-manager never consumed the block. Remove operational from bundledCertManager and bundledExternalDNS, drop the kyverno.bundled wrapper (it only carried operational), and delete the corresponding renderer plumbing. BundledContour keeps the block; its replicas knob maps to contour.replicaCount as before. Regenerate CRDs, deepcopy, the CLI-embedded chart copy, and the CRD-derived EducatesConfig schema. Amend the r3 draft (dated note in the operational-block pattern section, open item 4 now tracks per-service shapes) and add a decisions-log entry.
Five new samples populating every supported v1alpha1 spec field: 05-managed-full (Managed-mode kitchen sink, GKE-flavoured, with the reserved-but-rejected surface listed in the header), 06-inline-full (generic BYO with every inline field including the cross-namespace CA ref and clusterIssuerRef), and -full variants of the three platform CRs (sessionmanager-full keeps the rejected defaultAccessCredentials and registryMirrors blocks as comments). README gains both table sections and a note that -full files are field references, not starting points. All five strict-decode against the operator's typed API.
gke-full / eks-full / inline-full populate every field of their JSON schema, with toggles set opposite their defaults and explicit service accounts / role ARNs so the new round-trip tests also prove that WithDefaults doesn't overwrite explicit values. Mirrors the existing local-full.yaml + TestLoad_FullLocalConfig_RoundTripsAllFields pair.
Every subchart enabled (lookup-service and remote-access for the first time in any scenario) and every session-manager value block populated. pre-install stages the TLS/CA pair (scenario-02 logic) plus two theme Secrets and a dummy pull Secret; post-deploy asserts in four groups: typed-values serialisation into the educates-config blob, SecretCopier plumbing, optional-subchart rollouts, and the external defaultTheme served by the portal. The chart-values header documents the per-pod operational knobs (image, imagePullSecrets, resources, development.imageRegistry) deliberately left at defaults and where to set them. tests/README gains the missing 07/08 rows alongside the new 09 entry.
New conceptual companion to the CLI/Helm how-to guides: the three installation object layers (CLI config kinds, operator custom resources, Helm charts) and their relations, the five-stage value pipeline with an inspection command for every intermediate (render, helm get values per release, the ECC status contract, the educates-config Secret), a worked example tracing the ingress domain end to end, and an entry-point table for the CLI, GitOps, and standalone-Helm paths.
# Conflicts: # developer-docs/release-procedures.md # node-ca-injector/Dockerfile # vendir.lock.yml # vendir.yml
The imgpkg bundle duplicated what other artifacts already provide: GitHub releases carry the binaries for all four platforms, and the educates-cli image covers container use. Nothing in the repo consumed the bundle, and the push-client-programs Makefile target had a copy-paste bug (built linux twice, never darwin) that nobody noticed. Removes the publish-client-programs CI job, the push-client-programs Makefile target, and rewrites the docs to point at GitHub releases, workflow-run artifacts, and the educates-cli image instead.
The local CLI config snippets in the getting-started docs still used the v3 key names (localKindCluster, localDNSResolver, podCIDR/ serviceCIDR); the EducatesLocalConfig schema names these cluster, resolver and podSubnet/serviceSubnet. build-instructions.md still described the v3 source-deploy flow entirely: the developer-testing values file, the deploy-platform / push-installer-bundle / deploy-platform-app make targets (removed with the Carvel installer) and the create-cluster --version flag. Rewritten around the v4 flow: educates local config init/edit, educates admin platform deploy --local-config, imageVersions overrides for locally built images, and the operator docker-build + kind load loop.
Fills in the New Features placeholder with the operator-based install pipeline (educates-installer chart, four CRDs, Managed/Inline modes, bundled cluster services, ACME DNS01), the standalone runtime chart, the kind-based CLI configuration with published JSON schemas, the new admin platform / local cluster / local config commands, local-CA TLS, externalTLSTermination, and the OCI chart + image-list publishing. Prepends the v3-removal entries to Features Changed: Carvel installer gone with no in-place upgrade, values.yaml to config.yaml migration and key renames, no certificate-less installs, imageCache renamed to imagePrePuller, and the educates-client-programs bundle removal.
Every user-facing change on this branch must update project-docs/release-notes/version-4.0.0.md before it counts as complete; internal-only changes are exempt.
The generated local CA was only reachable as base64 inside the cached Secret YAML, so there was no reasonable way to tell users how to make their browser trust workshop URLs. 'educates local secrets export NAME --pem' prints the certificate of a cached TLS/CA secret as PEM (never the key), and 'local secrets add ca' now prints the export command and a docs pointer after generating a CA. The quick start gains a 'Trusting the workshop certificates' section with per-platform trust-store import instructions, and the 4.0.0 release notes cover the new flag.
Completes the chart bump started in the working tree (Makefile + SHA256SUMS + downloaded tarballs): embed.go now embeds contour-0.6.0 (appVersion 1.33.5) and kyverno-3.8.1 (appVersion v1.18.1), and the old tarballs are removed. Workload-name readiness gates re-verified by rendering both new charts: contour still produces contour-contour / contour-envoy with no webhooks, kyverno still produces the same four controller Deployments. The bundled kyverno policies in the session-manager chart move from kyverno/policies release-1.15 to release-1.18 to match: four cluster-policies pick up modernised CEL (variables blocks, optional chaining), the workshop-policies are unchanged, and policy names are stable so per-workshop cloning is unaffected. The session-manager subchart tarball is repackaged so the operator installs the refreshed policies. A new directory-consistency test ties the Makefile VENDORED_CHARTS list, SHA256SUMS, embed.go and the tarballs on disk together, so a half-finished bump or stale tarball now fails 'make test' instead of silently shipping the old chart. Both vendoring READMEs now document the full upgrade workflow, including the kyverno-policies follow-up.
…ge overrides The ImageOverride doc claimed any image could be overridden by short name, but three names silently did nothing: the session-manager chart-pod image, the pre-puller pause image and node-ca-injector all live outside the chart's imageVersions inventory. applySMImageValues now routes those three to the chart values that actually control them (image, imagePrePuller.pauseImage, and the node-ca-injector subchart's image via renderNodeCAInjectorValues); everything else flows through the inventory as before. The imagePrePuller enabled-toggle writer and the pauseImage router compose into one map instead of clobbering. Groundwork for the local build flow, where a dev-built CLI defaults every core image to the local registry.
…registry A CLI whose compiled-in version is not a semver (i.e. not stamped by the release pipeline — 'latest' from make, 'develop' fallback) now defaults an imageVersions entry for every core platform image to <imageRepository>/educates-<name>:<version>, and pins the operator pullPolicy to Always so rebuilt dev tags re-pull. Release binaries are semver-stamped and skip all of this; user-supplied entries always win. With the root Makefile building the same image set into localhost:5001, this is what makes 'make' + 'educates local cluster create' deploy the locally built system with zero manual config.
imageVersions entries named secrets-manager and lookup-service were silently appended to SessionManager.spec.images.overrides, where the chart's inventory ignores them — those components have their own CRs with ImageRef-shaped spec.image fields the reconcilers already consume. The translator now routes those two names there (split into repository + tag) and excludes them from the SessionManager overrides, uniformly across the Local, Inline, GKE and EKS kinds.
Plain 'make' now produces a complete locally-testable system: the educates CLI built with dev ldflags (projectVersion=latest, imageRepository=localhost:5001), the always-on local registry auto-deployed via that CLI, all core platform images and the operator image pushed to localhost:5001 — followed by a printed 'educates local cluster create' next step. Mechanics, per the build-harmonization direction in the integration plan (the cheap slice; component Makefiles and CI repointing stay deferred): - One image-% pattern rule replaces the 17 near-identical docker build recipes; per-image context dirs and build args via IMAGE_DIR.<name> / IMAGE_BUILD_ARGS.<name>. The operator image is now buildable from the root (image-operator), preceded by refresh-operator-embeds so the embedded subchart tarballs are always fresh (with a content-aware restore so helm's non-reproducible gzip doesn't dirty the tree). - Images build for the current host architecture ONLY unless TARGET_PLATFORMS is set explicitly — no more silent QEMU-emulated multiarch on push builds. The CLI builds for the host platform. - build-cli stamps the dev ldflags (overridable via CLI_VERSION / CLI_IMAGE_REPOSITORY) and refreshes the CLI-embedded chart + schemas first. build-client-programs kept as an alias. - Old build-<image> targets are dropped in favor of image-<name>; verify-installer-chart / verify-cli-schemas / embed-installer-chart / generate-cli-schemas keep their recipes (CI contract). The 130-line drifted header is replaced by a short one + 'make help'. Verified: release-stamped CLI renders zero localhost references against a clean config; dev CLI renders all 12; verify targets green.
build-instructions.md now leads with 'make' + 'educates local cluster create' as the complete from-source loop, documents the TARGET_PLATFORMS host-arch-only default and the other knobs, the image-<name> targets, the dev-vs-release CLI behavior, and the workshop-image opt-in path. The 4.0.0 release notes gain the imageVersions routing + dev-CLI defaulting entry.
go vet flags the unbuffered os.Signal channel passed to signal.Notify (a signal arriving before the goroutine is ready would be dropped). Came in from develop via the back-merge; one-line fix keeps client-programs CI vet green.
…nstaller # Conflicts: # client-programs/Dockerfile
…e workshop ClusterPolicy Kyverno 1.18 introduces the ValidatingPolicy type (policies.kyverno.io, ValidatingAdmissionPolicy-shaped), the recommended successor to the CEL ClusterPolicy form and the fix for the CEL admission warnings the old set emitted. The bundled policy set now ships as ValidatingPolicy: - cluster-policies (PSS baseline + restricted) re-vendored from kyverno/policies@release-1.18 pod-security-vpol, rendered cluster-wide as ValidatingPolicy (Audit). Drops the upstream-curated host-ports-range alternate (Audit-only, no enforcement change). - workshop-policies: 7 re-vendored from best-practices-vpol / other-vpol; the 3 nginx-ingress policies and the Educates-internal require-ingress-session-name have no upstream -vpol variant and are hand-ported to ValidatingPolicy CEL (require-ingress-session-name moves off the legacy JMESPath apiCall to read the session name from namespaceObject, fail-closed when the label is absent). session-manager/handlers/kyverno_rules.py is rewritten to scope each policy to a workshop environment's session namespaces. New Policy types get a per-env copy with spec.matchConstraints.namespaceSelector injected and validationActions from the workshop action (Enforce->Deny, Audit->Audit). Legacy ClusterPolicy supplied via workshopSecurity.additionalKyvernoPolicies is still scoped (merged as before) but logs a deprecation warning, tracking Kyverno's ClusterPolicy removal in 1.20. Unrecognised kinds are skipped with a warning. Core logic factored out and covered by a new pytest. session-manager gains an RBAC grant for validatingpolicies.policies.kyverno.io (the Kyverno-shipped admin:policies role covers only kyverno.io); the legacy binding is kept for the deprecated path. Release notes document the change and the ClusterPolicy deprecation.
On kind the cluster maps host 80/443 to the node's 80/443, so Envoy must
bind those node ports via hostPort to receive ingress traffic. The v4
Contour reconciler only set envoy.service.type and dropped v3's
"useHostPorts: true" for kind, so with a ClusterIP/LoadBalancer-pending
Envoy nothing listened on the node's 443 and TLS connections to workshop
and portal URLs were reset during the handshake (the cert and Envoy
itself were fine).
renderContourValues now enables envoy.useHostPort.{http,https} whenever
envoyServiceType is ClusterIP — a ClusterIP ingress controller is
non-functional without it, and this mirrors v3's kind topology
(ClusterIP service + host ports). The EducatesLocalConfig translator now
selects envoyServiceType: ClusterIP for the kind-based local install
(it previously left it unset, defaulting to LoadBalancer, which never
gets an address on kind). Cloud installs keep LoadBalancer/NodePort and
get no hostPort. Unit tests cover both the operator value mapping and
the translator invariant.
…nstaller # Conflicts: # assets-server/Dockerfile
Encodes how to update the third-party frontend libraries vendored into the workshop dashboard theme (Bootstrap, Font Awesome, jQuery, Underscore.js, JSONForm, js-yaml): upstream sources and versioned URLs, the zip-subset vs single-file download shapes, target paths, and verification. Also documents keeping each library aligned with the matching npm dependency in the renderer and gateway package.json files, which bundle the same libraries via esbuild.
…ironment) Both Go builder stages lacked the --platform=$BUILDPLATFORM prefix, so multiarch image builds ran the Go toolchain under QEMU for the non-native architecture instead of cross-compiling. - installer/operator/Dockerfile already had the cross-compile machinery (ARG TARGETOS/TARGETARCH, CGO_ENABLED=0 GOOS/GOARCH go build); only the --platform prefix was missing. Added it (matches node-ca-injector / client-programs / assets-server). - workshop-images/base-environment/Dockerfile built git-serve natively (no GOOS/GOARCH); pinned the builder to BUILDPLATFORM and added the TARGETOS/TARGETARCH + CGO_ENABLED=0 GOOS/GOARCH cross-compile. git-serve v0.0.5 cross-compiles cleanly to linux/amd64 and linux/arm64 (verified, statically linked); the output name and downstream COPY are unchanged. All five Go Dockerfiles now use the same cross-compile pattern.
…e-notes skills - New educates-upgrade-cluster-services skill: upgrading the vendored upstream cluster-service charts (cert-manager, Contour, Kyverno, external-dns) the operator installs — the four-places-in-lockstep model, per-chart sources (incl. Kyverno's chart-vs-binary version resolution), vendor-charts/verify/test flow, and the reconciler re-verification step. The vendored-charts README gains the matching Kyverno version-resolution note so the two stay in agreement. - educates-upgrade-go: add the installer/operator module (go.mod + Dockerfile), drop the removed tunnel-manager go.mod, reflect go-version-file-driven CI, and document the --platform=$BUILDPLATFORM cross-compile pattern (now on all five Go Dockerfiles). - educates-release-notes: fix tag lookup (no 'v' prefix), replace 'Upcoming Changes' with a Deprecations section, drop a stale path.
1c74329 to
540adef
Compare
The client-programs and installer-operator CI workflows duplicated their check lists in YAML, letting them drift from how the code is actually built — three failures hid behind that gap. Make the Makefile the single source of truth and have the workflows call it. New root Makefile targets, mirrored 1:1 by the workflows: - ci / ci-cli / ci-operator — go vet/build, drift checks, envtest, lint and chart-version lint — plus stage-renderer-files, factored out and reused by build-cli. - client-programs-ci.yaml and installer-operator-ci.yaml now just run `make ci-cli` / `make ci-operator` after checkout + setup-go. The operator's separate chart-sync-lint job folds into ci-operator. Fix the checks this surfaced, each previously masked by an earlier failing step: - Stage the gitignored renderer theme embed dir before go vet/build: hugo.go embeds it via //go:embed all:files/*, so a bare checkout matched nothing and client-programs CI failed every run. - operator validator: read the referenced ClusterIssuer via APIReader in checkClusterIssuer. The deferred ClusterIssuer watch uses an unstructured informer while the read used the typed cached client — two independent caches — so on deletion the watch fired the reconcile but the stale typed read left status wedged at Ready, with nothing re-triggering (flaky "flips Ready to Degraded" envtest spec under load). Matches the existing checkCASecret APIReader precedent. - client-programs CompressDirToFile discarded the os.Create error (errors.Errorf with no format verb, result not returned) — return a wrapped error instead. - operator test files tripped golangci-lint once make lint finally ran: a goconst "latest" constant, three lll line-length wraps, and two modernize strings.SplitSeq updates. Document the targets and the theme-staging / GOTOOLCHAIN gotchas in developer-docs/build-instructions.md, the operator README, and CLAUDE.md. Also bump GitHub Actions versions across the workflows (checkout, setup-go, docker/*, cache, artifact, pages, gh-release).
540adef to
2fd33cc
Compare
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Work on migrating Carvel installer to Helm installer.